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Problems of machine translation have been investigated in the 
Soviet Union since 1955*^ A number of groups are carrying out theoreti- 
cal and experimental work in the area of machine translation. 

In the Institute of Precision Mechanics and Computer Technology 
of the Academy of Sciences of the USSR (ITM and VT) dictionaries and 
cedes of rules (algorithms) have been compiled for machine translation 
from English, Chinese, and Japanese into Russian; and a Geraan-Russian 
algorithm is being worked out. Experimental translations of individ- 
ual passages have been made . ^ In the work of the ITM and VT group 
there is a marked striving for the rapid achievement of immediate, 
practical results. The efforts of this group are directed, not so 
much toward a theoretical comprehension of the general problem of 
machine translation, as toward a careful, detailed investigation of 
linguistic material, especially lexical. Dictionary routines, routines 
for analysis of the sentence in the source language and routines for 
the synthesis of the sentence in the target language are being coin- 
piled in the ITM and VT on the basis of traditional methods of des- 
cribing a language. 

an essentially different course is being folio-wed by the group 
working in the Steklov Mathematical Institute of the Academy of Sciences 
(MIAN). The problem of machine translation is being examined here as 

^ The idea of machine translation was advanced even in the 30* s 
by the inventor-technician, P. P. Smirnov-Troyansky. 

2 I. K. Bel'skaya, "Concerning Certain General Problems of Machine 
Translation, " Abstracts of the Conference on Machine Translation . 

Moscow, 1958, pp. 10-14, in the future abbreviated Abstracts . 
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part of the larger problem of the automation of thought processes. 

The directors of this group regard the effective practical realization 
of machine translation only as the result of profound theoretical re- 
search in the area of mathematics and linguistics. 

In i'-Ii/ui three algorithms have been elaborated: French-Kussian, 

English -Russian, and Hungarian -Kus si an . ^ 

tiring the compilation of the first of these algorithms in 1955 - 
56, the workers in this group proceeded empirically, i.e., they ex- 
tracted the rules of translation for each word from a comparative anal- 
ysis of French texts and their Russian translations. In the elabora- 
tion of the anglish-Hussian algorithm, the MIAN group posed for them- 
selves a more complex problem — to determine the correspondences between 
the grammatical structures of two languages. The posing of such a 
problem, was partially conditioned by the nature of the relationships 
of the English and Russian languages: if it was still possible to 
build the analysis of a sentence on a morphological basis in translating 
a French mathematical text into Russian, such a method did not seem 
rational to the MIAN group in the case of ^nglish-iiussian translations 
of similar texts. The problem was also partially conditioned by the 
theoretical goal of the director of the group. Professor A* A. Lyapunov: 
to work out strictly formal methods of describing languages in order 
to attain gradual automation of the whole process of machine translation. 

The theoretical basis for the isolation of typlxal sentence struc- 
tures was the concept of the syntagma (according to de Saussure) or of 
the construct (according to For tuna tov) . Machine translation, however, 

1 See 0. S. Kulagina and I. A. Mel* chuk," Machine Translation from 
French to Russian^' Voprosy Yazykoznaniva . 1956, No. 5; T. N, Moloshnaya, 

"Some Problems of Syntax in Connection with Machine Translation from 
English to Russian. " VYa. 1957, No. 4. 
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requires a certain modification of this system. In the structural 
syntactic .analysis proposed by the author, T. N. Roloshnaya, of 
the i:ii gli sh ~Ru s s ian algorithm worked out at MIAN, constructs consist- 
ing not only of two members but el so of many members (constructions 
with an absolute participle, etc.) are isolated. Such elementary 
structures were called configurations. They are composed of words 
classified according to formal sigis. The process of analysis consists 
each configuration its basic word, that is, it is 

shortened. In this way, syntactical links are established between the 
\;ords of a sentence. Synthesis of the Russian sentence is made by 
means of substituting for it a given English configuration which cor- 
responds to the Russian configuration and completing it with Russian 
words on the basis ox the data of the dictionary, more precisely, of 
• the Russian part of the dictionary, and on the basis of the corres- 
ponding morphological rules. The fact is that the dictionary for machine 
translation, as compiled at MIAN during work on the French-tor, sian. 
algorithm, consists of too ports: (D the foreign, containing the 
words of the given language (more precisely, their stems, understood 
as the graphically invariable ports of a word) with their correspond- 
ing tags (indicating part of speech, idiomatic relationships, govern- 
ment by preposition and grammatical characteristics), and (2), the 
Russian, containing Russian stems aid. the corresponding information 
about them. The Russian part of the dictionary is independent of the 
foreign part; so it may be used in translating from various languages. 
The rules for the morphological form of a Russian word are also inde- 
pendent of the language from which the translation is made. 
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The significance of the MIAN English-Russian algorithm consisted 
in thejCact that in contrast to all preceding algorithms in which the 
analysis of the text raider translation is realized in terras of a trails- .■ 

lation into Russian (a category of the Russian language is ascribed to 
a foreign word), in T. N, Koloshnaya’s algorithm the structural— gram- 
matical analysis of an English sentence proceeds, in principal, 

i 

independently of the language into which the text is being translated. 

This is extremely important, for an independent analysis opens the way 
for the realization of machine translation not only from one concrete 
language to another, but also from many languages to many others. 

Several scientific groups are now working along this path opened 
up by the efforts of the KEAN Group. In the division of applied lin- 

| 

guistics of the Institute of Linguistics of the USSR directed by A. A. 

! 

! 

Reformatsky, rules for the analysis and synthesis ui a text and. an jj 

abstract system of lexical and syntactic correspondences between var- 1 

j i 

xou3 languages are being worked cut by I, A. Mel'chuk independent iff j 

oo 

of A translation into a concrete language, all of which should allow 

us to do machine translation from several languages to several other 

languages (the model of such an intermediary language is being made 

on the basis of an analysis of Russian, English, Chinese, French, and 

Hungarian) , Syntactic analysis lies at the basis of the translation 

system being developed by I. A. Mel 1 chuk — morphologic?! dst p I 

only as auxiliary data in the establishment of configurations, i.e., 

in bringing out the relationships between words in the source language 

and the expression of these relationships by means of t-he target- len- 

gun go. 

• ! ! 

ji 

li 
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In this connection must be mentioned the research on the isolation 

% 

and cataloging of Hie systems of relationships in the Russian language 
carried out in close collaboration with I. A. Mel’chuk in the Labora- 
tory of Lie ctri cal Modelling of the All-Union Institute of Scientific 
and Technical Information of the State Scientific-Technical Committee 
in the soviet of Ministers of the USSR and of the Academy of Sciences 
of the USSR (LE). In Russian mathematical texts the workers of this 
laboratory, L, K. Volotskaya, L. V. Paducheva, I. N. Shelimova, and 
A, L. Shumilina isolated and described about 200 syntagmas (two-man — 
bored constructs in a subordinate relationship) which are essential 
in both the analysis and the synthesis of a Russian sentence. 

A substantial contribution to the theory of translation algorithms 
and their programming was made by 0. I. Kulagina (MIAN). She developed, 
a system of so-called elementary operators of the simplest steps of 
which any translation process may consist and of programs correspond- 
ing to these steps. As a result, significant generalization and stand- 
ardization in the process of making algorithms can be attained, all 
of which allows us to pose the problem of automation of the programming 
of algorithms and then the problem of their automation and construction. 

The Experimental Laboratory of Machine Translation of the Leningrad 
State University (LLMP) under the directorship of N. D. Andreyev is 
also endeavoring to realize the idea of working out completely inde- 
pendent methods of analysis and synthesis and of some abstract logical 
system making it possible to go from analysis to synthesis, i.e., cl AtfaiwK 
serving as an intermediary language. In this laboratory extensive 
material from various linguistic systems is being investigated; Indo- 
nesian-Russian, Arabic-Russian, Hindi-Russian, J apane se -Russian, 


H Approved For Release 2009/08/10 : CIA-RDP80T00246A008400260002-9 


Approved For Release 2009/08/10 : CIA-RDP80T00246A008400260002-9 


6 

Burmese -Russian, Norwegian -Russian, Bnglish-Russian, Spanish -Russian 
and Turkish-Rus sian algorithms are being developed. The intermediary 
language which N. D. Andreyev is attempting to create is an artificial 
larigu-i.~e constructed by averaging the phenomena of various languages. 

It, is regarded as a material language with its lexicon, morphology, 
aid. with its syntax, but with the one peculiarity that it consists of 
symbols . In the selection cf the categories at the basis of his sym- 
bolization, N. D* Andreyev considers the most frequent phenomena and 
also the interm tional prestige of each language.^- 

The system of signs developed in KLHP for the recording of the 
intermediary language can be used also for the recording of information 
in infer nation machines. 

Along vith work on the algorithms of machine translation from 
foreign languages into Russian and from Russian into foreign languages 
being conducted also in the Gorki State University, the following al- 
gorithms are being elaborated: Armenian-Russian and Russian -Armenian 
(in the Computation Center of the Academy of Sciences of the Armenian 
SSR), Georgian-Russian and Hussian-Georgian (in the Institute of Auto- 
mation and Telemechanics of the Academy of Sciences of the Georgian 
SSR),. 

in the First Moscow State Institute of Foreign Languages (I MGPIIYa) 
where under the directorship of I. I. Revsin theoretical investigations 
of the problems of machine translation and of related problems of lin- 
guistic theory of translation and of methodology of teaching foreign 
languages have been carried out, the elaboration of Russian-Lnglish , 
Russian-French, and Russi an-Spanish translation algorithms for foreign 


1 N, D, Adreyev, 11 Machine Translation and the Problem of an Inter- 
mediary Language," Voprosy Yazykoznaniya , 1957, Mo. 5. 
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policy texts has begun. At the Institute the Machine Translation 
Society has been created at whose meetings theoretical problems are 
discussed and an exchange of ideas about the practical problems of the 
compilation of the algorithms takes place. In the bulletin published 
by the Society are published both theoretical as well as experimental 
work connected with the problem of machine translation. In May, 1958, 
the Society convened the First All-Union Conference on Machine Trans- 
lation. Seventy-nine institutions were represented at the conference, 
including twenty-one institutes of the Academy of Sciences of the USSR 
and eight institutes of the Academies of Science of the Union Republics, 
eleven universities, and nineteen other institutions of higher learning 
in the country. Linguists, mathematicians, and technicians took part 
in the work of the conference. At the plenary and sectional meetings 
of tiie conference we re discussed mare than seventy reports and communi- 
cations devoted to general linguistic problems rising in connection 
with the use of language in present-day automatic devices a3 well as to 
special problems of construction of algorithms for machine translation,^ - 

The central problem at present confronting linguists working in 
the area of machine translation is that of the methods of the formal 
description of linguistic structures. Structural methods, particularly 
the methods elaborated by descriptive linguistics, offer much of value 
for the formal description of language — it was not by accident that the 
work of Fries in the structure of the LngLish language proved useful in 
working out English configurations. It has become clear, however, that 
these methods are inadequate for the formaUpMM of language to the ex- 
tent that this is demanded in automatic translation. In connection with this 

L See Abstracts. M., 1958* 
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a search far means of applying mathematical methods to the analysis 
of language was begun. With this in mind in 1956 in the Department 
of Philology of the Moscow State University was initiated a seminar 
on mathematical linguistics, joining mathematicians and linguists under 
the direction of P. S. Kuznetsov, V. V. Ivanov, and V. A. Uspensky. 
Here, as' well as at the meetings of the Machine Translation Society, 
were discussed the ideas of applying the methods of mathematical logic 
and of set theory to the study of laiguage suggested by Academician 


A. A . Kolmogorov and A. A. Lyapunov. Thus, for example. A, i . Kolmo- 
gorov' :■ idea about the possibility of a strict formal definition of 
the category of case (the work of V. A. Uspensky and, in part, also of 


R. L„ 3-obrusliin) was expounded and developed, it is interesting to 
note that - eigjit cases can be courted in the declensional system of the 


Pus::; an sifestantive according to this definition. 

ii method of defining grammatical categories worked out by a studen - 


of Processor Lyapunov, 0. 3. Kulagina (MIAN), was discussed at the 


seminar . 


Thin method of definition allows one to obtain, independently 


of the concrete 'features 'of- the language, a classification of words 
and a determination of their syntactic relationships. Language in 
this conception is regarded as a multiple of elements — words, or mors 
exactly — word forms. A finite number of words arranged in a definite 
order is called a sentence. Certain sentences are assumed to be marked 
these are sentences constructed according to the norms of the given 
language — others are unmarked. According to the criterion of mutual 
substitutability of words in the marked sentences the entire multiple 
of words is broken down into groups of mutually equivalent words. 
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In terns of this system a series of definitions corresponding, 
in general , to certain traditional morphological categories, for example, 
parts of speech, was successfully- obtained. The advantage of this 
classification consists, however, in the fact that it has been deduced 
on the basis of an exact and strictly formal, system of definitions. 

It is particularly effective for languages with a more symmetrical 
system of word forms (for example, for French). In languages like 
iiussian that do not possess this symmetry, the method of defining a 
grammatical category proposed by K. L. Dobrushin can be utilized. 

By making use of the criterion of equivalency, the relationships 
between the classes of words isolated are also determined. Moreover, 
the coaeepo of configuration, mentioned earlier, gets a more exact 
dexin.it ion: a configuration is defined by 0. S. Kulagina as that 
combination of not less than two words belonging to various non-inter- 
secting submultiples, which can be reduced to one element without any 
marked sentence containing tills configuration losing its marked quality. 
Thus, the combination o.f A words "thick book" in the sentence "the thick 
book lies on the ta.ble" can be reduced -to the element "book" or can 
be replaced by the element "thing" or the element "it" without the 
sentence ceasing to be marked. The isolation of the configurations 
allows one to determine the syntactic structure of the sentence. 

ihe set— theory concept of language is strictly deductive and 
formal,, This is just what determines its importance both for general 
linguistics aid for machine translation. Naturally, the 
ox len iUa.ge is possible only so a limited extent. Thus, the concept 
of the merited quality of sentences, without which it impossible to 
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determine the equivalence of elements and configurations of a language, 

will nave little effect if it is extended to all functional areas of 

language. But in a limited sphere of language — and machine transla- 
tes* 

tion at the present time is^considered UHM only wit bin the limits 
of scientific and te clinical prose — this concept is sufficiently exact 
and effective. Thus, all sentences which are meL. in a given field of 
scientific literature /In a given languagef can be considered marked. 

The set-theory conception of language is important in yet another 
respect. Since it allows us to construct and investigate a grammatical 
model, i.e., a simplified analog of actual linguistic relationships, 
this theory opens one of the possible ways for logico-semantic investi- 
gations of language. In this connection we should point to the ideas 
of V. V. Ivanov about the possibility of' applying mathematical methods 
to the definition of the lexical meaning of words. I note that, con- 
trary to wide-spread opinion, the theory of machine translation is not 
limited to the investigation of language in its formal aspect alone. 

The search for methods of an objective, precise description of the system 
of meanings in language has begun. 

If it is true that complete fomalMHB of an actual language 
is hardly accessible, that it is necessary to attain only formal approxi- 
mations to actual language, then a statistical evaluation of the proba- 
bility of this approximation acquires special import an ce.-*- On the other 
hand, certain phenomena of language do not yield for the time being 
to structural description and can be formally described only statisti- 
cally. 

See V. A. Uspensky, ’'Conference on the Statistics of Speech," 
Vopros y Yasykoananiyo . 1958, No. 1, p. 173. ' 
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The quantitative aspect of linguistic phenomena, both lexical and 
grammatical, has been considered, as a rule, in all the formulated 
algorithms* One should point particularly to the statistical investi- 
gations carried out on Russian language material in the Laboratory of 
Electrical Modelling. I have already mentioned the cataloguing of 
Russian syntagmas. This work was accompanied by a statistical investi- 
gation of the language ox Russian mathematical texts. The results 
cf this work conducted by I. A. Mel’chuk, T. N. Moloshnaya, A. L. 
Shumilina, Z, 24. Volotskaya, and I. I. Shelimova, were, along with 
other works, announced at the conference on the statistics of speech 
convoked in October 1957 by the Section of Speech of the Commission 
on Acoustics of the Academy of Sciences of the USSR and by Leningrad 
University. This work is of interest not only in a practical respect. 

Its value consists in a true solution to the problem of combining 
statistical and structural methods: a count of linguistic elements 
was carried out by the authors on the basis of a clear-cut definition 
of such concepts as "syntagma," "type of syntagna," etc. As I. I. 
kevzin showed in his report presented at the conference mentioned, 
the correlation of structural and statistical methods has a two-sided 
nature: statistics aids in specifying the structure of language and 
an exact structural definition of units, the number of which are counted, 
insures the proper conduct of the statistical investigation. 

A frequency count of dictionary units is important not only in 
connection with machine translation. No longer speaking about statis- 
tical investigations of problems of general and particular linguistics,! 

! In this connection one should recall the works in the statisti- 
cal investigation of Russian literary works, carried out in the 20* s 
and 30* s by A. I. Peshkovsky, M. Peterson, et al. 
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which have already become traditional, we shall point to recent works 
connected with the use of language in various devices for the storage, 
processing, and. transmission of information* In reference to the 
Russian material we can call attention to the use of methods of machine 
translation for the coding of telegraphic and telephonic messages. 

It has been established (V. I. Grigor'ev and G. G. Belonogov) that the 
size of a telegraph message in Russian can be diminished by 3-4 times 
if the telegraphic communication is translated from a letter code into 
a dictionary (lexical) code. Statistical investigations have shown 
that in the case of such coding 4,000 common words would be sufficient 
in order to insure the transmission of 97.5 percent of a g/eneral-language 
text . 

The problem examined here is connected, for the most part, with 
an analysis of the text under translation. For the Soviet specialists 
the elaboration of effective methods of analysis presented special 
difficulties; they dealt primarily with morphologically poor languages. 
It would be erroneous, however, to assume that the synthesis of the 
Russian sentence did not present any serious difficulties to them. 

By way ox illustration we may cite the difficulties arising in the 
synthesis of Russian aspectual forms, inasmuch as the category of 
aspect, as is known, permeates the entire Russian verbal system. 

Here two problems of principle arise. In the first place, it is 
necessery to find a principle of classification of Russian verbs which 
will allow us to obtain for each verb in an absolutely regular way 
(by adding or talcing away the same letters) all forms of the perfective 
as well as of the imperfective aspect. Such work was done by Z. M. 
Volotskaya (L£), who obtained three breakdowns of the whole Russian 
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verbal complex according to method of formation; a) of present tense 
farms; b) of past tense forms; and c) of the perfective stem from the 
inape rfective stera.^ 

In the second place — and this task is much more difficult — it is 
necessary to work out the rules of choice of one or the other aspectual 
form. Inasmuch as the tendency towards carrying out the operation 
of synthesis independently from those of analysis has already been 
noted, these rules must be constructed on the basis of contextual data, 
considering, for example, the presence in the sentence of adverbs, 
the character of the combination, etc. In a series of cases one must 
limit oneself only to a probable solution, based on statistics. 

The problem of machine translation from Russian occupies, of course, 
Soviet investigators less than the problem of translation into Russian. 
But investigative work connected with the analysis of the Russian 
sentence has already been begun (chiefly in the Laboratory’- of Electrical 
nodel/ing, the Division of Applied Linguistics of the Institute of 
Linguistics of the Academy of Science of the USSR and in ITM and VT). 

From the point of view of general linguistics the work revealing the 
redundancy of certain categories of the Russian language is most in- 
teresting. Thus, for example, the category of gender in the Russian 
verb, expressed only in the forms in -1 of the singular of the past 
tense end of the conditional mood, is redundant, unnecessary from the 
standpoint of analysis. It is clear (V. N. Vinogradova, the Institute 
of Linguistics of the Academy of Science of the USSR) that in scientific 
texts the number of verbs with the expressed form of gender comprises 
from four to thirty percent and that in the majority of sentences the 

1 See Abstracts . p. 87. 
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verb can be related only to the subject — the only sub stantive in the 
nominative case. In meet cases a n a n ee d - w e t a on aa d a no inflec 

tion of the Russian adjective and determine the relationship of the 
adjective to the substantive vdth vhich it agrees on the basis of the 
position of the adjective in the sentence (N. W. Leont’eva and G. H« 


Vavilova, the Institute c? Linguistics). 

Interesting also is the work on the determination of syntactic 
links for the preposition-case groups of the Russian language (I, N. 
Shelimova), and also the work on the elaboration of the syntactic 
links for formulas in Russian mathematical texts (M. M. Langleben) — 
by formulas the author means all elements not found in the machine 
dictionary during the processing of the text (mathematical formulas, 
foreign-language citations, surnames, etc.). 

For the analysis of a Russian sentence, it is necessary to char- 
acterize the marks of punctuation. Only in such a way can one find 
the limits of a simple clause within a sentence, isolate its similar 
members, aid the further clarification of the co-relationships of the 
individual parts of a sentence with complex punctuation, determine a 
group of similar members. T. N. Mikolayeva (ITM and VT) conducted 
an analysis of polysemantic marks of punctuation (comma, dash, colon) 
in Russian.-^ 

Thus the realization of machine translation presupposes serious 
theoretical investigations, which, in turn, enrich the problems of 
general and particular linguistics. 


1 


See Abstracts . pp. 104-107. 
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